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Q-t Abstract. This is a very brief summary of the techniques I used to analyze the IPTA 

challenge 1 data sets. I tried many things, and more failed than succeeded, but in the 
end I found two approaches that appear to work based on tests done using the open 
data sets. One approach works directly with the time domain data, and the other 
works with a specially constructed Fourier transform of the data. The raw data was 
run through TEMP02 to produce reduced timing residuals for the analysis. Standard 
Markov Chain Monte Carlo techniques were used to produce samples from the posterior 
distribution function for the model parameters. The model parameters include the 
gravitational wave amplitude and spectral slope, and the white noise amplitude for 
each pulsar in the array. While red timing noise was only included in Dataset 3, I 
found that it was necessary to include effective red noise in all the analyses to account 
for some of the spurious effects introduced by the TEMP02 timing fit. This added an 
additional amplitude and slope parameter for each pulsar, so my overall model for the 
36 pulsars residuals has 110 parameters. As an alternative to using an effective red 
noise model, I also tried to simultaneously re-fit the timing model model while looking 
for the gravitational wave signal, but for reasons that are not yet clear, this approach 
was not very successful. I comment briefly on ways in which the algorithms could be 
improved. My best estimates for the gravitational wave amplitudes in the three closed 
(blind) data sets are: (1) A = (7.3 ± 1.0) x 10~ 15 ; (2) A = (5.7 ±0.6) x 10" 14 ; and (3) 
A = (4.6 ± 1.3) x 10~ 15 . 
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1. Method 

A description of the IPTA data challenge and the data sets can be found at 



http://www.ipta4gvj.org/. The first round consisted of three parts - Datasets 1,2, and 
3, each with one open data set for testing purposes and one closed data set as a blind 
challenge. The datasets were arranged in increasing order of difficulty, starting with 
evenly sampled data with bright gravitational wave signals and moving on to unevenly 
sampled data with weak gravitational wave signals. The IPTA data challenges are 
similar to the Mock LISA Data Challenges [T], the main difference being that the people 
that put this challenge together have real detectors to work with as well. 

My approach to the analysis was to use Bayesian inference and Markov Chain Monte 
Carlo sampling to produce posterior distributions for the gravitational wave amplitude 
and the pulsar timing noise. The Bayesian approach requires a noise model - which fixes 
the form of the likelihood - a signal model, and priors on the model parameters. For 
stationary, Gaussian noise the likelihood is given by 

p(d\s,n) = 1 exp -- ]T r {ai) C^ mfm J, (1) 

^(27r) L detC V / 

where C is the covariance matrix, which depends on the noise in the detectors (in this 
case the individual Earth-pulsars arms) and the amplitude of the stochastic gravitation 
wave background, and r = al — s denotes the residuals after a signal model s has been 
subtracted from the data d. Here Greek indices a — 1,N label which detector (Earth- 
pulsar arm), and Latin indices i = 1,M label the data samples. This expression for 
the likelihood applies to any collection of gravitational wave detectors with stationary, 
Gaussian noise, and is widely used in theoretical studies of ground-based and space- 
based gravitational wave detectors [21 [31 HI [5]. It applies to any data representation, 
including time domain, frequency domain and wavelet domain. For evenly sampled time 
domain data with colored noise it is best to transform to the frequency domain where 
C(ai)(/3j) = C^p^Sij, as this leads to substantial savings when computing the likelihood: 
~ MN 3 versus ~ M 3 N 3 floating point operations. 

In the current setting we are looking for a stochastic gravitational wave signal, so 
we don't have a deterministic signal model for the GW component of the data. Instead, 
we are looking for correlations in the signals between Earth-pulsar pairs a, (3, which 
depend on the angle 9 a p between the line of sight to each pulsar and follow the Hellings- 
Downs [6] correlation curve H a p, which has the form 

1 3(1 - cos6> a/3 ) /l-cos0 a/9 \ 1- cos 6> Q/ 3 l v , n . 
H a p = ~ + i ln(^ J + -Sap. (2) 

We do however have a deterministic timing model that accounts for effects such as the 
pulsar spin-down, sky location and proper motion, and a host of other effects, such as 
relativistic time delays due to the Earth's motion about the Sun. The leading terms in 
the timing model have the form 

s(t) = ao + ait + a 2 t 2 + a 3 cos(a;ot + 4>o) + cos(a;o£ + (f>i) + a 5 cos(2a;o^ + 4>2) + . . . (3) 



IPTA Challenge 1 



3 



The parameters do, Oi, 02 are used to fit the pulsar spin-down, 03 and 0o account for the 
sky location and leading order time dilation effects, which introduce periodic variations 
with uo = 27r/year, a 4 and account for pulsar proper motion, while 05 and 4>2 account 
for second order time dilation effects and the eccentricity of the Earth's orbit. The 
publicly available TEMP02 EJ E] software package fits a more involved timing model 
to each individual pulsar using a chi-squared goodness of fit. Taking the TEMP02 
residuals and subtracting the raw data that was provided for the open data sets reveals 
a clear excess of power with a spectral slope of ~ —4, in addition to a wide bump around 
/ = 1/year from the 03 and 04 terms, along with a narrow spike at / = 2/year from the 
05 term. 

A naive approach would be to take the TEMP02 timing residuals and compare the 
the cross correlation between pulsar pairs to the Hellings- Downs correlation curve. The 
problem with this approach is that the TEMP02 fitting procedure introduces spurious 
features in the residuals, potentially removing or at least masking any gravitational wave 
signal that might be present. A better approach is to combine the fitting of the timing 
model with the cross-correlation analysis. I tried this approach, but the results were not 
entirely satisfactory. For some reason the recovered GW amplitude always tended to 
get biased high. I suspect that I might be missing some important contributions to the 
timing model. I plan to re-visit this approach in an effort to understand why it didn't 
work as expected. 

Simply ignoring the features introduced by TEMP02 and using a model that only 
accounts for the GW signal and white noise in Datasets 1 and 2, or the red and white 
noise of Dataset 3, leads to GW amplitudes that appear to be biased high (based on 
analysis of the open data sets). The one caveat here is that I am not 100% sure about 
wether the quoted amplitudes for the GW signal refer to the astrophysical signal or the 
effective level registered in the detector. The astrophysical level Sh is related to the 
observed level in an Earth-Pulsar link 5£ bs by 5£ bs = §5 ft (the two-thirds factor comes 
from sky and polarization averaging). This corresponds to an amplitude ratio bias of 



y 2/3 = 0.816. If the factor of two-thirds was not included, then my results for the GW 
amplitude need to be multiplied by a factor of 1.225. In any case, this factor is not 
large enough to account for the bias that I saw with the naive analysis, which showed 
a ~ 40% amplitude bias. To check my analysis codes in a more controlled fashion, I 
wrote my own simulation software, using the same approach for generating stochastic 
backgrounds that we used in the Mock LISA Data Challenges pi)] (based on a adding 
together the response to independent stochastic signals from different patches of the 
sky using a fine Healpix sky grid). This simulation confirmed that the two-thirds factor 
should be included, and using my own simulated data, with no TEMP02 processing, 
I was able to perfectly recover the injected GW amplitude and spectral slope, and the 
pulsar noise levels. 

Since the TEMP02 fitting is done on a per-pulsars basis, the errors in the timing 
models should be uncorrelated between pulsars. I therefore decided to model the 
timing model errors as an additional, uncorrelated red noise contribution to each pulsar, 
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described by a spectral amplitude S r and slope 7 r . For Dataset 1 we were told that 
all the pulsars shared the same white noise level, so for that analysis the model I used 
had 2N + 3 = 75 parameters (since there are N = 36 pulsars in the simulated array). 
For Dataset 2 the white noise level in each pulsar was different, so the model had 
3N + 2 = 110 parameters. For Dataset 3 there is an additional red noise component, but 
I assumed that this was already taken care of by the red noise model that I introduced 
to account for the TEMP02 fitting noise, so again the model had 110 parameters. 
Moreover, if I understood the statement about the red noise level that was injected, it is 
essentially irrelevant to the analysis, as it lies below the fiducial 100 ns white timing noise 
for frequencies above 10 nHz (Here I am using the fact that 100 ns white timing noise 
corresponds to a spectral density of S n = 2.417e — 8 Hz -3 , and the red noise was quoted 
at S r = 5.77e - 22/" L7 Hz" 3 , which cross at / = (5.77e - 22/2.417e - 8)°- 59 = 9.7nHz, 
which is at the very edge of the frequency band for a 5 year data set, and in a region 
where the fit to the pulsar spin-down removes any meaningful signal). In most runs 
I fixed the gravitational wave spectral slope at the expected value of 7 = —11/3 to 
improve the amplitude recovery. 

With the model established, the next step is to decide on how to compute the 
likelihood. The simplest and most robust approach is to implement (pQ) in the time 
domain, following the approach of van Haarsteren et al. [I]. The downside of this 
approach is that the matrix inversion is very computationally expensive for an L by L 
matrix with L = iVxM = 36 xl 30 = 4680 rows and columns. Since the correlation 
matrices are positive definite, they can be inverted by way of Cholesky factorization. 
The matrix determinant is given by product of the squares of the diagonal elements of 
the Cholesky decomposition. I used the LAPACK routines dpotrf and dpotri to perform 
the factorization and inversion. The codes were run on multi-core Mac Pro workstations, 
where an interesting side-effect of having the LAPACK routines as built in functions 
is that the matrix inversion is automatically parallelized, taking up an average of 4 
cores. Even with the parallel processing, each likelihood evaluation took ~ 6 seconds, 
making it difficult to produce long Markov Chains. To adequately explore the posterior 
distribution we need many samples in the chains, with the required number of samples 
scaling roughly linearly with the model dimension. The combination of the large model 
dimension and the high cost of generating the chains leads to long runs times (or the 
need to secure larger computing resources!). To shorten the "burn-in" phase of the 
analysis (where the chains locate the region of high posterior mass), I first ran on each 
pulsar individually using a three parameter model consisting of the white noise level, the 
gravitational wave amplitude (the slope was fixed at 7 = —13/3), and the low frequency 
cut-off in the spectrum. The spectral estimates from the individual pulsar runs were 
used as a starting point for the full run on the pulsar network. The low frequency cut-off 
parameter requires some explanation. The time domain auto-correlation function for 
time lags r, C(r), forms a Fourier pair with the frequency domain spectrum S(f): 




(4) 
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Setting / low = for power-law power spectra leads to improper correlation functions. 
As van Haarsteren et al. [I] point out, the correlation function should not depend very 
strongly on /i ow so long as /i ow < l/T b B , as the fitting of the pulsar spin-down model 
removes the part of the spectrum that causes the divergence in the limit /i ow — > 0. 
However, I found that the value of /i ow did affect the analysis quite significantly, so 
I made it a parameter to be determined from the data. The chains generally favored 
values around /i ow = 0.7/T obs . The correlation functions used in the full analysis have 
the form 



for a GW signal with amplitude A at / = 1/year and spectral slope 7. Here 
t = \t(ai) ~ t(Pj)\> an d the sum over n needs to extend to Q = 12 if /i ow is allowed 
to go as high as l/T obs . The injected GW signals have spectral slope 7 = —13/3. An 
expression similar to (jHJ) applies to the un-correlated red noise, but with H a p replaced 
by 6 a p. 

In the runs where I included a timing model of the form (J3J) for each pulsar, the 
MCMC alternated between updating the parameters in the correlation matrix (which 
costs of order ~ M 3 N 3 floating point operations), and updating the parameters in the 
timing model (which costs of order ~ M 2 N 2 floating point operations). Because the 
timing model updates are so much cheaper, it made sense to do many timing updates 
for each correlation matrix update. So while the timing model involves many more 
parameters than the spectral model, we are able to spend many more iterations exploring 
the much larger timing model parameter space for little additional computational cost. 
However, as I remarked earlier, the runs that used the timing model did not do as good 
a job of recovering the amplitude of the GW signal as did the runs with an uncorrelated 
red noise model. 

In an attempt to speed up the calculation of the likelihood, I also considered 
frequency domain implementations of the likelihood calculation ([T]). Transforming the 
data to the frequency domain turned out to be a highly non-trivial exercise due to the 
very red spectra and the uneven data sampling. Either one of these issues is easy to 
overcome on its own, but in tandem they are a bitch. The approach I settled on combined 
several techniques that have been used previously in pulsar timing data analysis. To 
handle the unevenly sampled data I used an tempered MCMC algorithm to fit the timing 
residuals to a sum of sines and cosines using the log likelihood 



C(ai){pj) — {5t a ) 5 a p Sij , 

for white noise with amplitude 5t a , and 



(5) 




(r( 7 + 1) cos(tt(7 + 3)/2)(27r/ low r)-^ +1 ) 




(6) 



logL = - lJ2 d i~ XX a i cos(ju}ti) + bj sm(juti)) /5t 2 , 





(7) 
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with 8t = 1 ns (the choice of <5i is largely irrelevant since I used a tempered MCMC). 
To avoid the Gibbs phenomenon from the finite-time window function and the very red 
spectra, I set u = 2-n/T with T > T b s . I found that T = l.lT b s worked well, and 
that somewhat larger and smaller values also worked. The sum over frequencies has to 
be extended beyond the usual Nyquist frequency because of the un-even time-sampling. 
Using the Fourier coefficients aj and bj directly gave reasonable results, but I found that 
it was better to re-map the data onto the true time interval with uniform time sampling, 
then transform the data with a standard FFT. To avoid the Gibbs phenomena I applied 
a zero phase shift double difference filter in the time domain: d'[i] = d[i—l]+d[i+l]—d[i], 
and compenstated by dividing the Fourier amplitudes derived from the d' sequence by 
the transfer function 2(cos(27r/At) — 1). Applying this filtering procedure to the full 
data set leads to aliasing of high frequency power to low frequencies, so I used the fit 
derived from (J7|) to first split the data into high-passed and low-passed components. 
The low frequency part was transformed with the filter, while the high frequency part 
was transformed without the filter. The results were then added together to give the 
complete Fourier transform. The sum over frequencies in ([1]) was restricted to the range 
[2/Tobs, 50/Tobs]- The lowest frequency bin was dropped as the fitting of the pulsar 
spin-down model takes out most of the very low frequency power, and including the 
l/T obs frequency bin was found to bias the GW amplitude low. I dropped the highest 
frequencies in part because they contain no information about the GW amplitude, and 
also because the interpolation of the data becomes less trustworthy one short time 
scales. For unevenly sampled data, the Fourier components are not strictly orthogonal, 
and C( a i)^j) 7^ C' a Ji)5ij. To keep the analysis simple I ignored this complication and 
assumed orthogonality. 

For the priors I used uniform distributions in the log of the spectral amplitudes, and 
uniform distributions in the spectral slope parameters. I used a very wide prior range, 
and found that the distributions stayed well within these ranges, with the exception of 
the time domain analysis of the closed Dataset3 data, where the low end of the GW 
prior was seen to cut-off the distribution, which suggests that the GW level was not 
confidently detected in that analysis. 

I found that in most of the challenge data sets the GW amplitude was sufficiently 
high that there was no question about wether a detection had been made. The one 
exception was the closed data from Dataset3. The GW amplitude flirted with zero 
during the time and frequency domain runs, indicating that a pure noise model may 
have greater Bayesian evidence. To test this I ran a trans-dimensional Reversible Jump 
Markov Chain Monte Carlo (RJMCMC) algorithm on the frequency domain data that 
allowed for transitions between a GW + noise model and a pure noise model. The ratio 
of the time that the chains spent in each model provides the Bayes factor, or evidence 
ratio for the two models. If our prior belief is that the two models are equally likely, 
then this ratio is the betting odds for one model versus the other. As we shall see, the 
odds turned out to favor the GW model by a fairly comfortable margin (but perhaps 
by not enough to send a paper to Nature if this was real data). 



IPTA Challenge 1 7 




oo r~. r*. r~ r*. 

O O O O O 0^--Tj-Tj-Tt^-Tj-TtTt 



Q) CD CD CD CDVVVVVVVV 

in ■-- LO -i- LO CMCDCDCDCDCDCDCDCD 

p t-; ^jLo^fLOLOLOcoLnr^ 

T - " CO LO CD 



Figure 1. Posterior distributions for the level of the white timing noise (left panel) 
and the dimensionless gravitational wave amplitude (right panel) for the open Datasetl 
data. 

2. Results 

I begin by showing results from the frequency domain analysis. The GW spectral slope 
was held fixed at 7 = —13/3 for the analyses shown in Figures 1-7. Figure 1 shows 
the posterior distributions for the level of the white timing noise (in seconds) and the 
dimensionless gravitational wave amplitude A using the open data from Dataset 1. 
The white noise level came out a little above the injected value of 10 -7 s, while the 
gravitational wave amplitude peaked very close to the injected value of A = 5 x 1CT 14 . 
Figure 2 shows the same quantities, but now for the closed Dataset 1 data. This time the 
white noise level was a fraction below 100 ns. The GW amplitude distribution peaked 
at A = 7.3 x 10~ 15 , which is significantly lower than the level seen in the open data set. 

Figure 3 shows the posterior distributions for the dimensionless gravitational wave 
amplitude for Dataset2 for both the open and closed data. The posterior distribution 
peaks very near the injected value of A = 5 x 10~ 14 for the open data set, and the 
distribution for the closed data set peaks a little higher at A = 5.7 x 10 -14 . 

Figure 4 shows the posterior distributions for the dimensionless gravitational wave 
amplitude for Dataset3 for both the open and closed data. The posterior distribution 
peaks very near the injected value of A = 1 x 10 -14 for the open data set, and the 
distribution for the closed data set peaks quite a bit lower at A = 4.6 x 10~ 15 . The 
standard deviation for the closed data set is a a = 1-3 x 10~ 15 , so in frequentist terms we 
have a "3.5 sigma" detection. Using a RJMCMC algorithm I found that the evidence 
ratio in favor of a gravitational wave signal being present was (180 ± 20) : 1, or a little 
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Figure 4. Posterior distributions for the gravitational wave amplitude for the Dataset3 
open (left) and closed (right) data. 



domain analysis by simply discarding the very low frequency data. A possible remedy in 
the time domain would be to modify the model for the GW power spectrum such that the 
spectrum turns over at low frequencies. I suspect that the /i ow parameter in the current 
model attempts to achieve a similar result. Unfortunately, the time domain analyses 
take days or weeks to run, so it isn't easy to try out a lot of alternative strategies. I 
present the current results from the time domain codes, but with the caveat that I am 
fairly certain that the levels are biased low by ~ 20% or more. 

Figure 5 shows the posterior distributions for the dimensionless gravitational wave 
amplitude for Datasetl for the open and closed data using a time domain analysis. 
The posterior distribution for the open data set peaks at A = 4.1 x 1CT 14 , which is 
quite a bit below the injected value. The distribution for the closed data set peaks at 
A = 5.9 x 1CT 15 , which is below the A = 7.3 x 1CT 15 value found by the frequency 
domain analysis. 

Figure 6 shows the posterior distributions for the dimensionless gravitational wave 
amplitude for Dataset2 for the open and closed data using a time domain analysis. The 
posterior distribution for the open data set peaks at A = 4.6 x 10~ 14 , which is below 
the injected value. The distribution for the closed data set peaks at A = 4.7 x 10 -14 , 
which is below the A = 5.7 x 1CT 14 value found by the frequency domain analysis. 

Figure 7 shows the posterior distributions for the dimensionless gravitational wave 
amplitude for Dataset3 for the open and closed data using a time domain analysis. 
The posterior distribution for the open data set peaks at A = 5.5 x 1CT 15 , which is 
significantly below the injected value. The distribution for the closed data set runs up 
against the boundary of the prior range, and it is not clear that detection has been made 
in this case. 
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Figure 5. Posterior distributions for the gravitational wave amplitude from a time 
domain analysis of Datasetl for the open (left) and closed (right) data. 




Figure 6. Posterior distributions for the gravitational wave amplitude from a time 
domain analysis of Dataset2 for the open (left) and closed (right) data. 



Returning to the frequency domain analysis, and now letting the spectral slope of 
the GW spectrum vary, I found the posterior distributions for the spectral slope 7 and 
the GW amplitude A shown in Figure 8. As expected, the spread in the amplitude 
distribution is wider than when 7 was held fixed. The distribution for 7 is biased a little 
high relative to the injected value of 7 = —13/3, which may be due to the procedure 
used to perform the Fourier transform, or more likely, to interplay with the red noise 
model used to account for the timing model errors. 
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Figure 8. Posterior distributions for the gravitational wave amplitude (left panel) 
and the spectral slope (right panel) for the open Datasetl data. 



3. Discussion 

I was able to get reasonable results from the time domain and frequency domain analyses. 
There does appear to be a bias in the time domain estimates for the gravitational wave 
amplitude, which I attribute to a mis-match between the spectral model (a simple 
power law) and what is likely present in the data (a power law that flatens out at low 
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frequencies). This should be easy to fix with a suitable change to the spectral model. 

I remain puzzled as to why a simultaneous fit to the timing model and the 
gravitational wave correlation function did not produce better results. This could be 
due to an implementation error, or it may be that the this approach is flawed in some 
more serious way. The frequency domain analysis suggests that modeling the errors in 
the timing model as red "modeling" noise is quite effective. I am still a little uneasy 
about the frequency domain analysis as the procedure I used to transform the data was 
fairly Byzantine. 

As a general rule I don't like to specify the model used in the analysis in too rigid 
a fashion. I prefer to let the data select the best model. In future I would like to 
use a full RJMCMC analysis that is able to transition between models. For example, 
when I look at the distributions for some of the red noise parameters it is clear that 
they are poorly constrained by the data. A RJMCMC analysis would automatically 
discard these parameters from the model. This approach may also cure some of the ills 
of the timing model analysis, where again it was clear that many of the timing model 
parameters were poorly constrained. 

I have some ideas about how to speed up the interminably slow time domain 
analysis. There the bottle neck comes from having to invert a large non-sparse matrix. 
Suppose however that we have found an approximate form for the matrix. We can then 
use the Cholesky factors to transform to a data representation where the a samples 
are approximately uncorrelated in time and space {i.e. in both a, (3 and i,j). In this 
transformed representation "nearby" correlation matrices will be almost diagonal, and 
fast perturbative methods can be used to compute their inverse. 
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